Knowledge Discovery in Spatial Databases
نویسندگان
چکیده
Knowledge discovery in databases is a complex process concerned with the discovery of relationships and other descriptions from data. Knowledge discovery in spatial databases represents a particular case of discovery, allowing the discovery of relationships that exist between spatial and non-spatial data, and other data characteristics that aren’t explicitly stored in spatial databases. This paper describes the conception and implementation of PADRÃO, a system for knowledge discovery in spatial databases. PADRÃO presents a new approach to this process, which is based on qualitative spatial reasoning. The spatial semantic knowledge and the principles of qualitative spatial reasoning needed for the spatial reasoning process are available in the PADRÃO’s geographic database and PADRÃO’s spatial knowledge base, allowing the integration of the geo-spatial component, associated with the analysed non-geographic data, in the process of knowledge discovery. Résumé (in French) ... Maribel Santos, Luís Amaral Information Systems Department and Algoritmi Research Centre University of Minho, Campus de Azurém 4800-019 Guimarães, PORTUGAL Telf.:+351 253 510259 Fax:+351 253 510250 e-mail: {maribel, amaral}@dsi.uminho.pt Knowledge Discovery in Spatial Databases: the PADRÃO’s qualitative approach Knowledge discovery in databases is a complex process concerned with the discovery of relationships and other descriptions from data. Knowledge discovery in spatial databases represents a particular case of discovery, allowing the discovery of relationships that exist between spatial and non-spatial data, and other data characteristics that aren’t explicitly stored in spatial databases. This paper describes the conception and implementation of PADRÃO, a system for knowledge discovery in spatial databases. PADRÃO presents a new approach to this process, which is based on qualitative spatial reasoning. The spatial semantic knowledge and the principles of qualitative spatial reasoning needed for the spatial reasoning process are available in the PADRÃO’s geographic database and PADRÃO’s spatial knowledge base, allowing the integration of the geo-spatial component, associated with the analysed non-geographic data, in the process of knowledge discovery. INTRODUCTION Large amounts of operational data concerning several years’ operation are now becoming available, mainly in middle-large sized organisations. Knowledge Discovery in Databases (KDD) is the key to access the strategic valued of the organisational knowledge buried in databases, usable both for daily operation, general management and strategic planning. The process of KDD automates the discovery of relationships and other descriptions from data. Data mining is one of the steps of this process, concerned with the application of specific algorithms for extracting patterns from data (Fayyad et al., 1996b). The main recognised advances in the area of KDD (Fayyad et al., 1996a) are related with the exploration of relational databases. However, in most organisational databases exists one dimension of data, the geographic (associated with addresses or postcodes), which semantics is not used by traditional KDD systems. Knowledge Discovery in Spatial Databases (KDSD) is related with “the extraction of interesting spatial patterns and features, general relationships that exist between spatial and non-spatial data, and other data characteristics not explicitly stored in spatial databases” (Koperski and Han, 1995). Spatial database systems are normally relational databases plus a concept of spatial location and spatial extension (Ester et al., 1997). The explicit location and extension of objects define implicit relations of spatial neighbourhood. The neighbour attributes of a given object may influence its behaviour and therefore must be considered in the process of knowledge discovery. Knowledge discovery in relational databases doesn’t takes into consideration this spatial reasoning, motivating the development of new algorithms adapted to the characteristics of spatial data. The main approaches in KDSD are characterised by the development of new algorithms that treat the objects’ position and extension through the manipulation of its co-ordinates (Ester et al., 1998, Lu et al., 1993, Koperski and Han, 1995, Koperski et al., 1998). These algorithms are subsequently implemented, extending traditional knowledge discovery systems. In all, a quantitative spatial reasoning approach is used, although the results are presented using qualitative identifiers (like far, close, North, ...). This paper describes the conception and implementation of PADRÃO, a system for KDSD. PADRÃO presents a new approach to the process of KDSD based on qualitative spatial reasoning and was implemented recurring to a traditional knowledge discovery system. The spatial semantic knowledge and the principles of qualitative spatial reasoning needed for the spatial reasoning process are available in the PADRÃO’s geographic database and PADRÃO’s spatial knowledge base, allowing the integration of the data geo-spatial component in the process of knowledge discovery. The integration of a geographic database, with the administrative subdivisions of Portugal at the municipality and district level, and a demographic database, storing the parish registers of the one district of Portugal, allowed to PADRÃO the discovery of implicit relationships existing between the analysed geographic and demographic data. This paper is organised in several sections. In them, qualitative spatial reasoning is defined and described how its concepts are used in the knowledge discovery process. The architecture of PADRÃO’ is presented, describing its main components, the several steps associated with it, and its implementation. The application of PADRÃO to the demographic domain is also illustrated, referring the type of discoveries that can be achieved with it. QUALITATIVE SPATIAL REASONING The positional aspects of geographic data are provided by a spatial reference, which relate the data to a given position on the Earth’s surface. Spatial references fall into two categories: based on co-ordinates or on geographic identifiers. In systems of spatial referencing using geographic identifiers (indirect referencing systems), a position is referenced to a real world location defined by a real world object. This object is termed a location, and its identifier is termed a geographic identifier (CEN/TC-287, 1998). These geographic identifiers are very common in organisational databases, allowing the integration of the spatial component associated with it in the process of knowledge discovery. The adoption of an indirect geographic reference system imposes the use of qualitative spatial reasoning strategies, able to deal with the spatial semantic not explicitly associated with the adopted geographic identifiers. Spatial reasoning is the process by which information about objects in space and their relationships are gathered through measurement, observation or inference, and used to arrive to valid conclusions regarding the objects’ relationships (Sharma, 1996). Qualitative spatial reasoning (Abdelmoty and El-Geresy, 1995) is based on the manipulation of qualitative spatial relations, for which composition tables facilitate reasoning, allowing the inference of new spatial knowledge. Spatial relations have been classified in several types (Frank, 1996, Papadias and Sellis, 1994), including direction relations (Frank, 1996, Freksa, 1992) (that describe order in space), distance relations (Hernández et al., 1995) (that describe proximity in space) and topological relations (Egenhofer, 1994) (that describe neighbourhood and incidence). These spatial relations are briefly described in the next subsections. Direction relations Direction relations describe where objects are placed relative to each other. Three elements are needed to establish an orientation: two objects and a fixed point of reference (usually the North Pole) (Frank, 1996, Freksa, 1992). Cardinal directions can be expressed using numerical values specifying degrees (0o, 45o...) or using qualitative values or symbols, such as North or South, those have an associated acceptance region. The regions of acceptance for qualitative directions can be obtained by projections (also known as half-planes) or cone-shaped regions (Figure 1). Figure 1: Directions definition by projection and cone-shaped systems
منابع مشابه
Discovery of General Knowledge in Large Spatial Databases
Extraction of interesting and general knowledge from large spatial databases is an important task in the development of spatial dataand knowledge-base systems. In this paper, we investigate knowledge discovery in spatial databases and develop a generalization-based knowledge discovery mechanism which integrates attribute-oriented induction on nonspatial data and spatial merge and generalization...
متن کاملKnowledge Discovery in Spatial Databases
Both, the number and the size of spatial databases, such as geographic or medical databases, are rapidly growing because of the large amount of data obtained from satellite images, computer tomography or other scientific equipment. Knowledge discovery in databases (KDD) is the process of discovering valid, novel and potentially useful patterns from large databases. Typical tasks for knowledge d...
متن کاملSpatial Data Mining: Progress and Challenges Survey Paper
“Spatial data mining, or knowledge discovery in spatial database, refers to the extraction of implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial databases.” (Koperski and Han, 1995) Data mining, or knowledge discovery in databases, refers to the “ discovery of interesting, implicit, and previously unknown knowledge from large databases.” (Frawley et al, 1992)
متن کاملTowards the Reduction of Spatial Joins for Knowledge Discovery in Geographic Databases Using Geo-Ontologies and Spatial Integrity Constraints
Spatial join is the most expensive operation in geographic databases, but essentially important to compute spatial relationships intrinsic to geographic data. In account of spatial relationships real world entities may affect the behavior of other entities in the neighborhood. Spatial relationships are fundamental for knowledge discovery in geographic databases and are strongly related to the d...
متن کاملMining Geo-Referenced Databases: A Way to Improve Decision-Making
Knowledge discovery in databases is a process that aims at the discovery of associations within data sets. The analysis of geo-referenced data demands a particular approach in this process. This chapter presents a new approach to the process of knowledge discovery, in which qualitative geographic identifiers give the positional aspects of geographic data. Those identifiers are manipulated using...
متن کامل